After half a month's battle in the first half of December, the results of the competition were not satisfactory, but as a first-time participant in such a formal competition, I have learnt a lot and the time was not wasted. I refactored the code I had written, referring to the open source code and feature ideas of one of the 17th ranked contestants, and documented the process, which I think will help students who do not know how to write competition baseline code.
In the process of refactoring the code, I learnt a lot and also made some corrections to my previous ideas of extracting features for the competition, which led to a lot of improvements in the offline results; for example, I was able to achieve an offline result of 1.7929 using only two files, t_user.csv and t_loan.csv, and during the competition I heard that I could achieve 1.80, 1.79 using only these two tables I had heard during the competition that it was possible to achieve 1.80 and 1.79 using only these two tables.
Explanation of functions in refactored code and naming of feature variables.
gen_train_feat.py : Scripts for extracting features from the training set (data from August, September and October)
gen_test_feat.py : Script for extracting test set special crossings (data for September, October, November)
util.py : Some of the helper functions used
train.py : Training scripts
The source code and dataset for this project is referenced at Click here Alternative links Password:AllenMa